NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

SymProp: Scaling Sparse Symmetric Tucker Decomposition via Symmetry Propagation

https://doi.org/10.1109/IPDPS64566.2025.00109

Li, Zecheng; Shivakumar, Shruti; Li, Jiajia; Kannan, Ramakrishnan (June 2025, IEEE)

Free, publicly-accessible full text available June 3, 2026
On Rank Selection for Nonnegative Matrix Factorization

https://doi.org/10.1109/BigData62323.2024.10825324

Eswar, Srinivas; Hayashi, Koby; Cobb, Benjamin; Kannan, Ramakrishnan; Ballard, Grey; Vuduc, Richard; Park, Haesun (December 2024, IEEE)

Rank selection, i.e. the choice of factorization rank, is the first step in constructing Nonnegative Matrix Factorization (NMF) models. It is a long-standing problem which is not unique to NMF, but arises in most models which attempt to decompose data into its underlying components. Since these models are often used in the unsupervised setting, the rank selection problem is further complicated by the lack of ground truth labels. In this paper, we review and empirically evaluate the most commonly used schemes for NMF rank selection.
more » « less
Full Text Available
Distributed-Memory Parallel JointNMF

https://doi.org/10.1145/3577193.3593733

Eswar, Srinivas; Cobb, Benjamin; Hayashi, Koby; Kannan, Ramakrishnan; Ballard, Grey; Vuduc, Richard; Park, Haesun (June 2023, Proceedings of the 37th International Conference on Supercomputing)

Joint Nonnegative Matrix Factorization (JointNMF) is a hybrid method for mining information from datasets that contain both feature and connection information. We propose distributed-memory parallelizations of three algorithms for solving the JointNMF problem based on Alternating Nonnegative Least Squares, Projected Gradient Descent, and Projected Gauss-Newton. We extend well-known communication-avoiding algorithms using a single processor grid case to our coupled case on two processor grids. We demonstrate the scalability of the algorithms on up to 960 cores (40 nodes) with 60\% parallel efficiency. The more sophisticated Alternating Nonnegative Least Squares (ANLS) and Gauss-Newton variants outperform the first-order gradient descent method in reducing the objective on large-scale problems. We perform a topic modelling task on a large corpus of academic papers that consists of over 37 million paper abstracts and nearly a billion citation relationships, demonstrating the utility and scalability of the methods.
more » « less
Full Text Available
Fragile Earth: Accelerating Progress towards Equitable Sustainability

https://doi.org/10.1145/3447548.3469484

Abe, Naoki; Buckingham, Kathleen; Dilkina, Bistra; Eftelioglu, Emre; Ganguly, Auroop; Hodson, James; Kannan, Ramakrishnan (August 2021, Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining)
null (Ed.)
Full Text Available
PLANC: Parallel Low-rank Approximation with Nonnegativity Constraints

https://doi.org/10.1145/3432185

Eswar, Srinivas; Hayashi, Koby; Ballard, Grey; Kannan, Ramakrishnan; Matheson, Michael A.; Park, Haesun (June 2021, ACM Transactions on Mathematical Software)
null (Ed.)
We consider the problem of low-rank approximation of massive dense nonnegative tensor data, for example, to discover latent patterns in video and imaging applications. As the size of data sets grows, single workstations are hitting bottlenecks in both computation time and available memory. We propose a distributed-memory parallel computing solution to handle massive data sets, loading the input data across the memories of multiple nodes, and performing efficient and scalable parallel algorithms to compute the low-rank approximation. We present a software package called Parallel Low-rank Approximation with Nonnegativity Constraints, which implements our solution and allows for extension in terms of data (dense or sparse, matrices or tensors of any order), algorithm (e.g., from multiplicative updating techniques to alternating direction method of multipliers), and architecture (we exploit GPUs to accelerate the computation in this work). We describe our parallel distributions and algorithms, which are careful to avoid unnecessary communication and computation, show how to extend the software to include new algorithms and/or constraints, and report efficiency and scalability results for both synthetic and real-world data sets.
more » « less
Full Text Available
Parallel Hierarchical Clustering using Rank-Two Nonnegative Matrix Factorization

https://doi.org/10.1109/HiPC50609.2020.00028

Manning, Lawton; Ballard, Grey; Kannan, Ramakrishnan; Park, Haesun (December 2020, 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC))
null (Ed.)
Full Text Available
Tensorized Feature Spaces for Feature Explosion

https://doi.org/10.1109/ICPR48806.2021.9412320

Pasricha, Ravdeep S.; Devineni, Pravallika; Papalexakis, Evangelos E.; Kannan, Ramakrishnan (January 2021, International Conference on Pattern Recognition (ICPR) 2020)
null (Ed.)
Full Text Available
Distributed-Memory Parallel Symmetric Nonnegative Matrix Factorization

https://doi.org/10.1109/SC41405.2020.00078

Eswar, Srinivas; Hayashi, Koby; Ballard, Grey; Kannan, Ramakrishnan; Vuduc, Richard; Park, Haesun (November 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis)
null (Ed.)
Full Text Available
A supernodal all-pairs shortest path algorithm

https://doi.org/10.1145/3332466.3374533

Sao, Piyush; Kannan, Ramakrishnan; Gera, Prasun; Vuduc, Richard (February 2020, Symposium on Principles and Practice of Parallel Program-ming)
null (Ed.)
We show how to exploit graph sparsity in the Floyd-Warshall algorithm for the all-pairs shortest path (Apsp) problem.Floyd-Warshall is an attractive choice for APSP on high-performing systems due to its structural similarity to solving dense linear systems and matrix multiplication. However, if sparsity of the input graph is not properly exploited,Floyd-Warshall will perform unnecessary asymptotic work and thus may not be a suitable choice for many input graphs. To overcome this limitation, the key idea in our approach is to use the known algebraic relationship between Floyd-Warshall and Gaussian elimination, and import several algorithmic techniques from sparse Cholesky factorization, namely, fill-in reducing ordering, symbolic analysis, supernodal traversal, and elimination tree parallelism. When combined, these techniques reduce computation, improve locality and enhance parallelism. We implement these ideas in an efficient shared memory parallel prototype that is orders of magnitude faster than an efficient multi-threaded baseline Floyd-Warshall that does not exploit sparsity. Our experiments suggest that the Floyd-Warshall algorithm can compete with Dijkstra’s algorithm (the algorithmic core of Johnson’s algorithm) for several classes sparse graphs.
more » « less
Full Text Available
Scalable Knowledge Graph Analytics at 136 Petaflop/s

https://doi.org/10.1109/SC41405.2020.00010

Kannan, Ramakrishnan; Sao, Piyush; Lu, Hao; Herrmannova, Drahomira; Thakkar, Vijay; Patton, Robert; Vuduc, Richard; Potok, Thomas (November 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis)
null (Ed.)
We are motivated by newly proposed methods for data mining large-scale corpora of scholarly publications, such as the full biomedical literature, which may consist of tens of millions of papers spanning decades of research. In this setting, analysts seek to discover how concepts relate to one another. They construct graph representations from annotated text databases and then formulate the relationship-mining problem as one of computing all-pairs shortest paths (APSP), which becomes a significant bottleneck. In this context, we present a new high-performance algorithm and implementation of the Floyd-Warshall algorithm for distributed-memory parallel computers accelerated by GPUs, which we call DSNAPSHOT (Distributed Accelerated Semiring All-Pairs Shortest Path). For our largest experiments, we ran DSNAPSHOT on a connected input graph with millions of vertices using 4, 096nodes (24,576GPUs) of the Oak Ridge National Laboratory's Summit supercomputer system. We find DSNAPSHOT achieves a sustained performance of 136×1015 floating-point operations per second (136petaflop/s) at a parallel efficiency of 90% under weak scaling and, in absolute speed, 70% of the best possible performance given our computation (in the single-precision tropical semiring or “min-plus” algebra). Looking forward, we believe this novel capability will enable the mining of scholarly knowledge corpora when embedded and integrated into artificial intelligence-driven natural language processing workflows at scale.
more » « less
Full Text Available

« Prev Next »

Search for: All records